30 research outputs found

    A Factor Graph Approach to Automated GO Annotation

    Get PDF
    As volume of genomic data grows, computational methods become essential for providing a first glimpse onto gene annotations. Automated Gene Ontology (GO) annotation methods based on hierarchical ensemble classification techniques are particularly interesting when interpretability of annotation results is a main concern. In these methods, raw GO-term predictions computed by base binary classifiers are leveraged by checking the consistency of predefined GO relationships. Both formal leveraging strategies, with main focus on annotation precision, and heuristic alternatives, with main focus on scalability issues, have been described in literature. In this contribution, a factor graph approach to the hierarchical ensemble formulation of the automated GO annotation problem is presented. In this formal framework, a core factor graph is first built based on the GO structure and then enriched to take into account the noisy nature of GO-term predictions. Hence, starting from raw GO-term predictions, an iterative message passing algorithm between nodes of the factor graph is used to compute marginal probabilities of target GO-terms. Evaluations on Saccharomyces cerevisiae, Arabidopsis thaliana and Drosophila melanogaster protein sequences from the GO Molecular Function domain showed significant improvements over competing approaches, even when protein sequences were naively characterized by their physicochemical and secondary structure properties or when loose noisy annotation datasets were considered. Based on these promising results and using Arabidopsis thaliana annotation data, we extend our approach to the identification of most promising molecular function annotations for a set of proteins of unknown function in Solanum lycopersicum.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Roda, Fernando. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

    Consistent prediction of GO protein localization

    Get PDF
    The GO-Cellular Component (GO-CC) ontology provides a controlled vocabulary for the consistent description of the subcellular compartments or macromolecular complexes where proteins may act. Current machine learning-based methods used for the automated GO-CC annotation of proteins suffer from the inconsistency of individual GO-CC term predictions. Here, we present FGGA-CC+, a class of hierarchical graph-based classifiers for the consistent GO-CC annotation of protein coding genes at the subcellular compartment or macromolecular complex levels. Aiming to boost the accuracy of GO-CC predictions, we make use of the protein localization knowledge in the GO-Biological Process (GO-BP) annotations to boost the accuracy of GO-CC prediction. As a result, FGGA-CC+ classifiers are built from annotation data in both the GO-CC and GO-BP ontologies. Due to their graph-based design, FGGA-CC+ classifiers are fully interpretable and their predictions amenable to expert analysis. Promising results on protein annotation data from five model organisms were obtained. Additionally, successful validation results in the annotation of a challenging subset of tandem duplicated genes in the tomato non-model organism were accomplished. Overall, these results suggest that FGGA-CC+ classifiers can indeed be useful for satisfying the huge demand of GO-CC annotation arising from ubiquitous high throughout sequencing and proteomic projects.Fil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Arce, Debora Pamela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; ArgentinaFil: Bulacio, Pilar. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Tapia, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

    Proper integration of feature subsets boosts GO subcellular localization predictions

    Get PDF
    La predicción de múltiples localizaciones subcelulares en proteínas brinda información relavante para el descubrimiento de funciones biológicas. El uso de métodos computacionales basados en el conocimiento puede ser un buen punto de partida para conducir a las costosas validaciones experimentales. En este trabajo, presentamos un framework de clasificación multi-etiqueta para para realizar la predicción en Gene Ontology - Componente Celular enfocada en la mejora de dos aspectos del diseño: i) la caracterización de la secuencia proteica, relacionando el conocimiento biológico con la evidencia experimental; y ii) la evaluación de errores al considerar un modelo de ruido inherente a los frameworks de predicción reales. Nuestra propuesta es validada contra un conjunto de secuencias de proteínas de cuatro organismos modelos D. rerio, A. thaliana, S. cerevisiae and D. melanogaster.Prediction of multiple subcellular localizations in proteins brings relevant information for biologicalfunction discovery. The use of computational methods based on knowledge can be a helpful starting point forguiding the costly experimental validation. In this work, we present a multilabel classifier framework to performGene Ontology - Cellular Component prediction focused on the improvement of two design aspects: i) the proteinsequence characterization, regarding biological knowledge with experimental evidence, and ii) the error evaluation byconsidering a noise model inherent in real prediction frameworks. Our proposal is validated against sets of well-knownprotein sequences of four model organisms D. rerio, A. thaliana, S. cerevisiae and D. melanogasterFil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Tapia Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Murillo, Javier. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic Flavia. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Ponce Sergio. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; ArgentinaFil: Angelone, Laura Monica. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Bulacio, Pilar Estela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; Argentin

    Targeting hodgkin and reed–sternberg cells with an inhibitor of heat-shock protein 90: Molecular pathways of response and potential mechanisms of resistance

    Get PDF
    Classical Hodgkin lymphoma (cHL) cells overexpress heat-shock protein 90 (HSP90), an important intracellular signaling hub regulating cell survival, which is emerging as a promising therapeutic target. Here, we report the antitumor effect of celastrol, an anti-inflammatory compound and a recognized HSP90 inhibitor, in Hodgkin and Reed-Sternberg cell lines. Two disparate responses were recorded. In KM-H2 cells, celastrol inhibited cell proliferation, induced G0/G1 arrest, and triggered apoptosis through the activation of caspase-3/7. Conversely, L428 cells exhibited resistance to the compound. A proteomic screening identified a total of 262 differentially expressed proteins in sensitive KM-H2 cells and revealed that celastrol’s toxicity involved the suppression of the MAPK/ERK (extracellular signal regulated kinase/mitogen activated protein kinase) pathway. The apoptotic effects were preceded by a decrease in RAS (proto-oncogene protein Ras), p-ERK1/2 (phospho-extracellular signal-regulated Kinase-1/2), and c-Fos (proto-oncogene protein c-Fos) protein levels, as validated by immunoblot analysis. The L428 resistant cells exhibited a marked induction of HSP27 mRNA and protein after celastrol treatment. Our results provide the first evidence that celastrol has antitumor effects in cHL cells through the suppression of the MAPK/ERK pathway. Resistance to celastrol has rarely been described, and our results suggest that in cHL it may be mediated by the upregulation of HSP27. The antitumor properties of celastrol against cHL and whether the disparate responses observed in vitro have clinical correlates deserve further research.Fil: Segges, Priscilla. Instituto Nacional de Câncer; BrasilFil: Corrêa, Stephany. Instituto Nacional de Câncer; BrasilFil: Du Rocher, Bárbara. Fundación Oswaldo Cruz; Brasil. Instituto Nacional de Câncer; BrasilFil: Vera Lozada, Gabriela. Instituto Nacional de Câncer; BrasilFil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Arce, Debora Pamela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Sternberg, Cinthya. Universidade Federal do Rio de Janeiro; BrasilFil: Abdelhay, Eliana. Instituto Nacional de Câncer; BrasilFil: Hassan, Rocio. Instituto Nacional de Câncer; Brasi

    Regulatory motifs found in the small heat shock protein (sHSP) gene family in tomato

    Get PDF
    Background: In living organisms, small heat shock proteins (sHSPs) are triggered in response to stress situations. This family of proteins is large in plants and, in the case of tomato (Solanum lycopersicum), 33 genes have been identified, most of them related to heat stress response and to the ripening process. Transcriptomic and proteomic studies have revealed complex patterns of expression for these genes. In this work, we investigate the coregulation of these genes by performing a computational analysis of their promoter architecture to find regulatory motifs known as heat shock elements (HSEs). We leverage the presence of sHSP members that originated from tandem duplication events and analyze the promoter architecture diversity of the whole sHSP family, focusing on the identification of HSEs. Results: We performed a search for conserved genomic sequences in the promoter regions of the sHSPs of tomato, plus several other proteins (mainly HSPs) that are functionally related to heat stress situations or to ripening. Several computational analyses were performed to build multiple sequence motifs and identify transcription factor binding sites (TFBS) homologous to HSF1AE and HSF21 in Arabidopsis. We also investigated the expression and interaction of these proteins under two heat stress situations in whole tomato plants and in protoplast cells, both in the presence and in the absence of heat shock transcription factor A2 (HsfA2). The results of these analyses indicate that different sHSPs are up-regulated depending on the activation or repression of HsfA2, a key regulator of HSPs. Further, the analysis of protein-protein interaction between the sHSP protein family and other heat shock response proteins (Hsp70, Hsp90 and MBF1c) suggests that several sHSPs are mediating alternative stress response through a regulatory subnetwork that is not dependent on HsfA2. Conclusions: Overall, this study identifies two regulatory motifs (HSF1AE and HSF21) associated with the sHSP family in tomato which are considered genomic HSEs. The study also suggests that, despite the apparent redundancy of these proteins, which has been linked to gene duplication, tomato sHSPs showed different up-regulation and different interaction patterns when analyzed under different stress situations.Fil: Arce, Debora Pamela. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Spetale, Flavio Ezequiel. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Krsticevic, Flavia. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Cacchiarelli, Paolo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Las Rivas, Javier De. Universidad de Salamanca; EspañaFil: Ponce, Sergio. Universidad Tecnológica Nacional. Facultad Reg.san Nicolas. Secretaria de Ciencia y Tecnología; ArgentinaFil: Pratta, Guillermo. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Instituto de Investigaciones en Ciencias Agrarias de Rosario. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias. Instituto de Investigaciones en Ciencias Agrarias de Rosario; ArgentinaFil: Tapia, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentin

    Robust and scalable barcoding for massively parallel long‑read sequencing

    Get PDF
    Nucleic-acid barcoding is an enabling technique for many applications, but its use remains limited in emerging long-read sequencing technologies with intrinsically low raw accuracy. Here, we apply so-called NS-watermark barcodes, whose error correction capability was previously validated in silico, in a proof of concept where we synthesize 3840 NS-watermark barcodes and use them to asymmetrically tag and simultaneously sequence amplicons from two evolutionarily distant species (namely Bordetella pertussis and Drosophila mojavensis) on the ONT MinION platform. To our knowledge, this is the largest number of distinct, non-random tags ever sequenced in parallel and the frst report of microarray-based synthesis as a source for large oligonucleotide pools for barcoding. We recovered the identity of more than 86% of the barcodes, with a crosstalk rate of 0.17% (i.e., one misassignment every 584 reads). This falls in the range of the index hopping rate of established, highaccuracy Illumina sequencing, despite the increased number of tags and the relatively low accuracy of both microarray-based synthesis and long-read sequencing. The robustness of NS-watermark barcodes, together with their scalable design and compatibility with low-cost massive synthesis, makes them promising for present and future sequencing applications requiring massive labeling, such as long-read single-cell RNA-Seq.Fil: Ezpeleta, Joaquín. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Labari, Ignacio Garcia. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Bulacio, Pilar. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Tapia, Elizabeth. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina.Fil: Ezpeleta, Joaquín. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentina.Fil: Bulacio, Pilar. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentina.Fil: Tapia, Elizabeth. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentina.Fil: Villanova, Gabriela Vanina. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Lavista Llanos, Sofía. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina.Fil: Villanova, Gabriela Vanina. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Laboratorio Mixto de Biotecnología Acuática. Centro Científico Tecnológico y Educativo Acuario del Río Paraná; Argentina.Fil: Posner, Victoria María. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Laboratorio Mixto de Biotecnología Acuática. Centro Científico Tecnológico y Educativo Acuario del Río Paraná; Argentina.Fil: Arranz, Silvia Eda. Universidad Nacional de Rosario. Facultad de Ciencias Bioquímicas y Farmacéuticas. Laboratorio Mixto de Biotecnología Acuática. Centro Científico Tecnológico y Educativo Acuario del Río Paraná; Argentina.Fil: Krsticevic, Flavia. The Hebrew University of Jerusalem. Robert H Smith Faculty of Agriculture, Food and Environment; Israel

    Improved reference genome of the arboviral vector Aedes albopictus

    Get PDF
    Background: The Asian tiger mosquito Aedes albopictus is globally expanding and has become the main vector for human arboviruses in Europe. With limited antiviral drugs and vaccines available, vector control is the primary approach to prevent mosquito-borne diseases. A reliable and accurate DNA sequence of the Ae. albopictus genome is essential to develop new approaches that involve genetic manipulation of mosquitoes. Results: We use long-read sequencing methods and modern scaffolding techniques (PacBio, 10X, and Hi-C) to produce AalbF2, a dramatically improved assembly of the Ae. albopictus genome. AalbF2 reveals widespread viral insertions, novel microRNAs and piRNA clusters, the sex-determining locus, and new immunity genes, and enables genome-wide studies of geographically diverse Ae. albopictus populations and analyses of the developmental and stage-dependent network of expression data. Additionally, we build the first physical map for this species with 75% of the assembled genome anchored to the chromosomes. Conclusion: The AalbF2 genome assembly represents the most up-to-date collective knowledge of the Ae. albopictus genome. These resources represent a foundation to improve understanding of the adaptation potential and the epidemiological relevance of this species and foster the development of innovative control measures

    Tandem duplication events in the expansion of the small heat shock protein gene family in solanum lycopersicum (cv. Heinz 1706)

    Get PDF
    In plants, fruit maturation and oxidative stress can induce small heat shock protein (sHSP) synthesis to maintain cellular homeostasis. Although the tomato reference genome was published in 2012, the actual number and functionality of sHSP genes remain unknown. Using a transcriptomic (RNA-seq) and evolutionary genomic approach, putative sHSP genes in the Solanum lycopersicum (cv. Heinz 1706) genome were investigated. A sHSP gene family of 33 members was established. Remarkably, roughly half of the members of this family can be explained by nine independent tandem duplication events that determined, evolutionarily, their functional fates. Within a mitochondrial class subfamily, only one duplicated member, Solyc08g078700, retained its ancestral chaperone function, while the others, Solyc08g078710 and Solyc08g078720, likely degenerated under neutrality and lack ancestral chaperone function. Functional conservation occurred within a cytosolic class I subfamily, whose four members, Solyc06g076570, Solyc06g076560, Solyc06g076540, and Solyc06g076520, support ~57% of the total sHSP RNAm in the red ripe fruit. Subfunctionalization occurred within a new subfamily, whose two members, Solyc04g082720 and Solyc04g082740, show heterogeneous differential expression profiles during fruit ripening. These findings, involving the birth/death of some genes or the preferential/plastic expression of some others during fruit ripening, highlight the importance of tandem duplication events in the expansion of the sHSP gene family in the tomato genome. Despite its evolutionary diversity, the sHSP gene family in the tomato genome seems to be endowed with a core set of four homeostasis genes: Solyc05g014280, Solyc03g082420, Solyc11g020330, and Solyc06g076560, which appear to provide a baseline protection during both fruit ripening and heat shock stress in different tomato tissues.Fil: Krsticevic, Flavia Jorgelina. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Arce, Debora Pamela. Universidad Tecnológica Nacional. Facultad Regional San Nicolás; Argentina. Universidad Nacional de Rosario. Facultad de Ciencias Agrarias; ArgentinaFil: Ezpeleta, Joaquin. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; ArgentinaFil: Tapia Paredes, Elizabeth. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; Argentina. Universidad Nacional de Rosario. Facultad de Ciencias Exactas, Ingeniería y Agrimensura; Argentin

    Long-read single molecule sequencing to resolve tandem gene copies: The Mst77Y region on the drosophila melanogaster Y chromosome

    Get PDF
    The autosomal gene Mst77F of Drosophila melanogaster is essential for male fertility. In 2010, Krsticevic et al. (Genetics 184: 2952307) found 18 Y-linked copies of Mst77F ("Mst77Y"), which collectively account for 20% of the functional Mst77F-like mRNA. The Mst77Y genes were severely misassembled in the then-available genome assembly and were identified by cloning and sequencing polymerase chain reaction products. The genomic structure of the Mst77Y region and the possible existence of additional copies remained unknown. The recent publication of two long-read assemblies of D. melanogaster prompted us to reinvestigate this challenging region of the Y chromosome. We found that the Illumina Synthetic Long Reads assembly failed in the Mst77Y region, most likely because of its tandem duplication structure. The PacBio MHAP assembly of the Mst77Y region seems to be very accurate, as revealed by comparisons with the previously found Mst77Y genes, a bacterial artificial chromosome sequence, and Illumina reads of the same strain. We found that the Mst77Y region spans 96 kb and originated from a 3.4-kb transposition from chromosome 3L to the Y chromosome, followed by tandem duplications inside the Y chromosome and invasion of transposable elements, which account for 48% of its length. Twelve of the 18 Mst77Y genes found in 2010 were confirmed in the PacBio assembly, the remaining six being polymerase chain reaction2induced artifacts. There are several identical copies of some Mst77Y genes, coincidentally bringing the total copy number to 18. Besides providing a detailed picture of the Mst77Y region, our results highlight the utility of PacBio technology in assembling difficult genomic regions such as tandemly repeated genes.Fil: Krsticevic, Flavia Jorgelina. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas. Universidad Nacional de Rosario. Centro Internacional Franco Argentino de Ciencias de la Información y de Sistemas; ArgentinaFil: Schrago, Carlos G.. Universidade Federal do Rio de Janeiro; BrasilFil: Carvalho, A. Bernardo. Universidade Federal do Rio de Janeiro; Brasi
    corecore